人们普遍认为,在传输学习中,包括更多的预训练数据可以转化为更好的性能。但是,最近的证据表明,从源数据集中删除数据实际上也可以提供帮助。在这项工作中,我们仔细研究了源数据集在转移学习中的作用,并提出了探索其对下游性能的影响的框架。我们的框架产生了新的功能,例如精确转移学习脆弱性以及检测诸如数据渗漏等病理和源数据集中存在误导示例之类的病理。特别是,我们证明,消除通过框架确定的有害数据点可改善来自ImageNet的转移学习绩效,以了解各种目标任务。代码可从https://github.com/madrylab/data-transfer获得
translated by 谷歌翻译
我们识别普遍对抗扰动(UAP)的性质,将它们与标准的对抗性扰动区分开来。具体而言,我们表明,由投影梯度下降产生的靶向UAPS表现出两种人对齐的特性:语义局部性和空间不变性,标准的靶向对抗扰动缺乏。我们还证明,除标准对抗扰动之外,UAPS含有明显较低的泛化信号 - 即,UAPS在比标准的对抗的扰动的较小程度上利用非鲁棒特征。
translated by 谷歌翻译
GPT-3显示了培训的大规模语言模型(LMS)的卓越情调学习能力,培训数十亿规模数据。在这里,我们解决了GPT-3纸张报告的一些剩余问题,例如非英语LM,不同大小模型的性能,以及最近引入的迅速优化对上下文学习的效果。为实现这一目标,我们介绍了HyperClova,一个韩国VPT-3的韩国变体训练在一个以韩国为中心的560b标准的令牌。通过我们的韩国特定标记化,HyperClova与我们的培训配置增强,显示了韩国各种下游任务的最先进的上下游零射击和几秒钟学习表演。此外,我们展示了基于及时的学习的性能优势,并演示如何集成到迅速的工程管道中。然后,我们讨论了通过引入Hyperclova Studio,互动提示工程界面向ML的非专家提供AI原型设计能力来实现No Code AI范例的可能性。最后,我们展示了我们具有三个成功的内部应用程序的方法的潜力。
translated by 谷歌翻译
This paper proposes a new regularization algorithm referred to as macro-block dropout. The overfitting issue has been a difficult problem in training large neural network models. The dropout technique has proven to be simple yet very effective for regularization by preventing complex co-adaptations during training. In our work, we define a macro-block that contains a large number of units from the input to a Recurrent Neural Network (RNN). Rather than applying dropout to each unit, we apply random dropout to each macro-block. This algorithm has the effect of applying different drop out rates for each layer even if we keep a constant average dropout rate, which has better regularization effects. In our experiments using Recurrent Neural Network-Transducer (RNN-T), this algorithm shows relatively 4.30 % and 6.13 % Word Error Rates (WERs) improvement over the conventional dropout on LibriSpeech test-clean and test-other. With an Attention-based Encoder-Decoder (AED) model, this algorithm shows relatively 4.36 % and 5.85 % WERs improvement over the conventional dropout on the same test sets.
translated by 谷歌翻译
Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.
translated by 谷歌翻译
Constrained reinforcement learning (RL) is an area of RL whose objective is to find an optimal policy that maximizes expected cumulative return while satisfying a given constraint. Most of the previous constrained RL works consider expected cumulative sum cost as the constraint. However, optimization with this constraint cannot guarantee a target probability of outage event that the cumulative sum cost exceeds a given threshold. This paper proposes a framework, named Quantile Constrained RL (QCRL), to constrain the quantile of the distribution of the cumulative sum cost that is a necessary and sufficient condition to satisfy the outage constraint. This is the first work that tackles the issue of applying the policy gradient theorem to the quantile and provides theoretical results for approximating the gradient of the quantile. Based on the derived theoretical results and the technique of the Lagrange multiplier, we construct a constrained RL algorithm named Quantile Constrained Policy Optimization (QCPO). We use distributional RL with the Large Deviation Principle (LDP) to estimate quantiles and tail probability of the cumulative sum cost for the implementation of QCPO. The implemented algorithm satisfies the outage probability constraint after the training period.
translated by 谷歌翻译
The recent advent of play-to-earn (P2E) systems in massively multiplayer online role-playing games (MMORPGs) has made in-game goods interchangeable with real-world values more than ever before. The goods in the P2E MMORPGs can be directly exchanged with cryptocurrencies such as Bitcoin, Ethereum, or Klaytn via blockchain networks. Unlike traditional in-game goods, once they had been written to the blockchains, P2E goods cannot be restored by the game operation teams even with chargeback fraud such as payment fraud, cancellation, or refund. To tackle the problem, we propose a novel chargeback fraud prediction method, PU GNN, which leverages graph attention networks with PU loss to capture both the players' in-game behavior with P2E token transaction patterns. With the adoption of modified GraphSMOTE, the proposed model handles the imbalanced distribution of labels in chargeback fraud datasets. The conducted experiments on two real-world P2E MMORPG datasets demonstrate that PU GNN achieves superior performances over previously suggested methods.
translated by 谷歌翻译
最近的深度学习模型在言语增强方面已经达到了高性能。但是,获得快速和低复杂模型而没有明显的性能降解仍然是一项挑战。以前的知识蒸馏研究对言语增强无法解决这个问题,因为它们的输出蒸馏方法在某些方面不符合语音增强任务。在这项研究中,我们提出了基于特征的蒸馏多视图注意转移(MV-AT),以在时域中获得有效的语音增强模型。基于多视图功能提取模型,MV-AT将教师网络的多视图知识传输到学生网络,而无需其他参数。实验结果表明,所提出的方法始终提高瓦伦蒂尼和深噪声抑制(DNS)数据集的各种规模的学生模型的性能。与基线模型相比,使用我们提出的方法(一种用于有效部署的轻巧模型)分别使用了15.4倍和4.71倍(FLOPS),与具有相似性能的基线模型相比,Many-S-8.1GF分别达到了15.4倍和4.71倍。
translated by 谷歌翻译
在本文中,我们提出了一种数据驱动的技能学习方法,以完全从离线的远程播放数据数据完全求解高度动态的操纵任务。我们使用双边遥控系统连续收集一大批灵活而敏捷的操纵行为,通过向操作员提供直接的力反馈来实现。我们以目标条件条件的政策和技能条件状态过渡动态的形式共同学习国家条件潜在技能分布和技能解码器网络。这使人们可以在学习的技能空间中执行基于模型的在线计划和离线计划方法,以在测试时完成任何给定的下游任务。我们提供模拟和现实世界的双臂操纵实验,表明可以实时组成一系列力控制的动态操纵技能,以成功地将框配置为随机选择的目标位置和方向;请参阅补充视频,https://youtu.be/la5b236ilzm。
translated by 谷歌翻译
神经网络量化旨在将特定神经网络的高精度权重和激活转变为低精度的权重/激活,以减少存储器使用和计算,同时保留原始模型的性能。但是,紧凑设计的主链体系结构(例如Mobilenets)通常用于边缘设备部署的极端量化(1位重量/1位激活)会导致严重的性能变性。本文提出了一种新颖的量化感知训练(QAT)方法,即使通过重点关注各层之间的权重之间的重量间依赖性,也可以通过极端量化有效地减轻性能退化。为了最大程度地减少每个重量对其他重量的量化影响,我们通过训练一个依赖输入依赖性的相关矩阵和重要性向量来对每一层的权重进行正交转换,从而使每个权重都与其他权重分开。然后,我们根据权重量化的重要性来最大程度地减少原始权重/激活中信息丢失的重要性。我们进一步执行从底层到顶部的渐进层量化,因此每一层的量化都反映了先前层的权重和激活的量化分布。我们验证了我们的方法对各种基准数据集的有效性,可针对强神经量化基线,这表明它可以减轻ImageNet上的性能变性,并成功地保留了CIFAR-100上具有紧凑型骨干网络的完整精确模型性能。
translated by 谷歌翻译